Add opt-in OTel GenAI metrics (0067) by chris-colinsky · Pull Request #177 · LunarCommand/openarmature-python

chris-colinsky · 2026-06-22T18:05:14Z

Implements accepted proposal 0067 (spec v0.68.0): adds the OpenTelemetry metrics signal to the bundled OTel observer, opt in with enable_metrics. Pin advances v0.67.0 to v0.68.0 (0067 is the only proposal in the delta).

What changed

Two OA-namespaced histogram instruments over provider calls, recorded only when enable_metrics=True (default off):

openarmature.gen_ai.client.token.usage ({token}): per LLM completion, two observations, the input and output token counts (tagged openarmature.gen_ai.token.type), from the response usage record. Recorded only when the call returned usage.
openarmature.gen_ai.client.operation.duration (s): the provider-call wall-clock duration, once per attempt under call-level retry, including a failed attempt (which carries error.type).

Both carry openarmature.gen_ai.operation ("chat"), gen_ai.request.model, and gen_ai.system, with the spec's explicit bucket advisories. The Meter comes from the configured MeterProvider (injectable via meter_provider=...; the OTel global no-op fallback when none is set). Metrics are independent of span emission: they record even with disable_llm_spans=True. Metrics target OTel only (no Langfuse mapping). The instrument names are OA-namespaced, mirroring the upstream gen_ai.client.* instruments (still at Development status), so a future cutover is a mechanical prefix-strip.

Implementation note

The proposal sources metrics from the typed completion/failure events and requires duration "once per attempt". In this implementation the per-attempt event is the internal LlmRetryAttemptEvent (the LLM-span source since 0050), which already carries latency, usage, error category, model, and provider, so metrics record from it: one duration sample per attempt, token usage only when usage is present. The terminal events are not used (they would double-count). This is the same internal-event latitude the spec blessed for the 0050 per-attempt span surface.

Embedding metrics deferred

The proposal's embedding-call metrics (fixture 089) are deferred: the embedding capability (proposal 0059) is unimplemented in python until a later release, so there is no embedding event or provider to record from. conformance.toml records 0067 partial on that basis. The LLM-call fixtures (088 / 090 / 091) are implemented and wired through a private MeterProvider plus an in-memory MetricReader (the conformance-adapter metric-capture primitive); 089 rides the deferred set.

Tests

Unit: token + duration emission, error.type on failure, the disabled no-op, span-independence, and once-per-attempt-under-retry (asserting on histogram counts, since identical-dimension observations aggregate).
Conformance: 088 / 090 / 091 run; 089 skipped. The fixture-parser schema gained an expected.metrics field, its discriminator key, and a calls_embed node directive so 089 still round-trips.
Full suite green; ruff + pyright clean; mkdocs build --strict clean.

The OTel observer can now emit the metrics signal alongside its spans: two histogram instruments over provider calls, gated by a new enable_metrics flag (default off, independent of span emission). One records an LLM completion's input and output token counts; the other records the call duration, once per attempt under call-level retry and including a failed attempt (which carries error.type). Both draw from the per-attempt LlmRetryAttemptEvent, the LLM-span source, so metrics record even with spans disabled. The Meter comes from the configured MeterProvider (injectable; falls back to the OTel global no-op when none is set). Implements proposal 0067 (observability metrics), LLM path.

Advance the spec pin v0.67.0 -> v0.68.0 across the four sync points (submodule, __spec_version__, pyproject, conformance manifest) and the smoke assertion; regenerate the bundled AGENTS.md. Wire conformance fixtures 088 / 090 / 091 through a new metrics driver that captures observations via a private MeterProvider plus an in-memory MetricReader (the conformance-adapter metric-capture primitive); the embedding fixture 089 is deferred until the embedding capability lands. Teach the fixture-parser schema the new shapes (expected.metrics and the calls_embed node directive). Record proposal 0067 partial, document the enable_metrics flag, and add the CHANGELOG entry.

Copilot

Pull request overview

Adds opt-in OpenTelemetry metrics emission to the bundled OTelObserver per accepted spec proposal 0067 (spec v0.68.0), alongside the usual span emission, and updates the spec pin + conformance harness to validate the new fixtures.

Changes:

Add enable_metrics + meter_provider to OTelObserver, creating/recording two OA-namespaced GenAI histogram instruments from LlmRetryAttemptEvent.
Extend conformance + unit tests to capture/validate emitted metrics via a private MeterProvider + InMemoryMetricReader, and add fixture-schema support (expected.metrics, calls_embed for deferred 089).
Bump pinned spec version from 0.67.0 → 0.68.0 across runtime, pyproject, conformance manifest, docs, and changelog.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/openarmature/observability/otel/observer.py`	Implements opt-in metrics instruments + per-attempt recording for duration/token usage.
`tests/unit/test_observability_otel.py`	Adds unit tests asserting metrics emission, disabling behavior, span-independence, and retry attempt counting.
`tests/conformance/test_observability.py`	Wires new metrics fixtures (088/090/091) and adds a metrics fixture driver/capture/assertion helpers.
`tests/conformance/harness/expectations.py`	Extends observability expected schema with `metrics`.
`tests/conformance/harness/directives.py`	Adds `calls_embed` directive shape so deferred embedding fixture 089 parses/round-trips.
`docs/concepts/observability.md`	Documents `enable_metrics`, instruments, dimensions, and meter-provider behavior.
`tests/test_smoke.py`	Updates spec-version assertion to 0.68.0.
`pyproject.toml`	Updates `[tool.openarmature].spec_version` to 0.68.0.
`src/openarmature/__init__.py`	Updates `__spec_version__` to 0.68.0.
`src/openarmature/AGENTS.md`	Updates bundled agent-doc header to spec v0.68.0.
`conformance.toml`	Advances `spec_pin` and records proposal 0067 as `partial` with rationale.
`CHANGELOG.md`	Adds release note entry for OTel GenAI metrics and updates spec-pin summary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

chris-colinsky added 2 commits June 22, 2026 10:57

Copilot AI review requested due to automatic review settings June 22, 2026 18:05

Copilot started reviewing on behalf of chris-colinsky June 22, 2026 18:05 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Comment thread src/openarmature/observability/otel/observer.py

chris-colinsky merged commit 4c7198f into main Jun 22, 2026
7 checks passed

chris-colinsky deleted the feature/0067-genai-metrics branch June 22, 2026 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in OTel GenAI metrics (0067)#177

Add opt-in OTel GenAI metrics (0067)#177
chris-colinsky merged 2 commits into
mainfrom
feature/0067-genai-metrics

chris-colinsky commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chris-colinsky commented Jun 22, 2026

What changed

Implementation note

Embedding metrics deferred

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants